Cuckoo hashing

Cuckoo hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table. Cuckoo hashing was first described by Rasmus Pagh and Flemming Friche Rodler in 2001.[1] The name derives from the behavior of some species of cuckoo, where the cuckoo chick pushes the other eggs or young out of the nest when it hatches.

Contents

Theory

The basic idea is to use two hash functions instead of only one. This provides two possible locations in the hash table for each key. In one of the commonly used variants of the algorithm, the hash table is split into two smaller tables of equal size, and each hash function provides an index into one of these two tables.

When a new key is inserted, a greedy algorithm is used: The new key is inserted in one of its two possible locations, "kicking out", that is, displacing, any key that might already reside in this location. This displaced key is then inserted in its alternative location, again kicking out any key that might reside there, until a vacant position is found, or the procedure enters an infinite loop. In the latter case, the hash table is rebuilt in-place[2] using new hash functions.

Lookup requires inspection of just two locations in the hash table, which takes constant time in the worst case (see Big O notation). This is in contrast to many other hash table algorithms, which may not have a constant worst-case bound on the time to do a lookup.

It can also be shown that insertions succeed in expected constant time,[1] even considering the possibility of having to rebuild the table, as long as the number of keys is kept below half of the capacity of the hash table, i.e., the load factor is below 50%. One method of proving this uses the theory of random graphs: one may form an undirected graph called the "Cuckoo Graph" that has a vertex for each hash table location, and an edge for each hashed value, with the endpoints of the edge being the two possible locations of the value. Then, the greedy insertion algorithm for adding a set of values to a cuckoo hash table succeeds if and only if the Cuckoo Graph for this set of values is a pseudoforest, a graph with at most one cycle in each of its connected components. This property is true with high probability for a random graph in which the number of edges is less than half the number of vertices.[3]

Example

The following hashfunctions are given:

h\left(k\right)=k\mod 11
h'\left(k\right)=\lfloor\frac{k}{11}\rfloor\mod 11

k h(k) h'(k)
20 9 1
50 6 4
53 9 4
75 9 6
100 1 9
67 1 6
105 6 9
3 3 0
36 3 3
39 6 3
1. table for h(k)
20 50 53 75 100 67 105 3 36 39
0
1 100 67 67 67 67 100
2
3 3 3 36
4
5
6 50 50 50 50 50 105 105 105 50
7
8
9 20 20 20 20 20 20 53 53 53 75
10
2. table for h'(k)
20 50 53 75 100 67 105 3 36 39
0 3
1 20 20 20 20
2
3 36 39
4 53 53 53 53 50 50 50 53
5
6 75 75 75 75 75 75 67
7
8
9 100 100 100 100 105
10

Cycle

If you now wish to insert the element 6, then you get into a cycle. In the last row of the table we find the same initial situation as at the beginning again.

h\left(6\right)=6\mod 11=6
h'\left(6\right)=\lfloor\frac{6}{11}\rfloor\mod 11=0

considered key table 1 table 2
old value new value old value new value
6 50 6 53 50
53 75 53 67 75
67 100 67 105 100
105 6 105 3 6
3 36 3 39 36
39 105 39 100 105
100 67 100 75 67
75 53 75 50 53
50 39 50 36 39
36 3 36 6 3
6 50 6 53 50

Generalizations and applications

Generalizations of cuckoo hashing that use more than 2 alternative hash functions can be expected to utilize a larger part of the capacity of the hash table efficiently while sacrificing some lookup and insertion speed. Using just three hash functions increases the load to 91%. Another generalization of cuckoo hashing consists in using more than one key per bucket. Using just 2 keys per bucket permits a load factor above 80%.

Other algorithms that use multiple hash functions include the Bloom filter. Cuckoo hashing can be used to implement a data structure equivalent to a Bloom filter. A simplified generalization of cuckoo hashing called skewed-associative cache is used in some CPU caches.

A study by Zukowski et al.[4] has shown that cuckoo hashing is much faster than chained hashing for small, cache-resident hash tables on modern processors. Kenneth Ross[5] has shown bucketized versions of cuckoo hashing (variants that use buckets that contain more than one key) to be faster than conventional methods also for large hash tables, when space utilization is high. The performance of the bucketized cuckoo hash table was investigated further by Askitis,[6] with its performance compared against alternative hashing schemes.

A survey by Mitzenmacher[7] presents open problems related to cuckoo hashing as of 2009.

See also

References

  1. ^ a b Pagh, Rasmus; Rodler, Flemming Friche (2001) (PDF, PS). Cuckoo Hashing. doi:10.1.1.25.4189. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.4189. Retrieved 2008-10-16. 
  2. ^ Pagh and Rodler: "There is no need to allocate new tables for the rehashing: We may simply run through the tables to delete and perform the usual insertion procedure on all keys found not to be at their intended position in the table."
  3. ^ Kutzelnigg, Reinhard (2006). "Fourth Colloquium on Mathematics and Computer Science". pp. 403–406. http://www.dmtcs.org/dmtcs-ojs/index.php/proceedings/article/viewFile/590/1710 
  4. ^ Zukowski, Marcin; Heman, Sandor; Boncz, Peter (2006-06) (PDF). Architecture-Conscious Hashing. Proceedings of the International Workshop on Data Management on New Hardware (DaMoN). http://www.cs.cmu.edu/~damon2006/pdf/zukowski06archconscioushashing.pdf. Retrieved 2008-10-16. 
  5. ^ Ross, Kenneth (2006-11-08) (PDF). Efficient Hash Probes on Modern Processors. IBM Research Report RC24100. RC24100. http://domino.research.ibm.com/library/cyberdig.nsf/papers/DF54E3545C82E8A585257222006FD9A2/$File/rc24100.pdf. Retrieved 2008-10-16. 
  6. ^ Askitis, Nikolas (2009). Fast and Compact Hash Tables for Integer Keys. 91. 113–122. ISBN 978-1-920682-72-9. http://crpit.com/confpapers/CRPITV91Askitis.pdf. 
  7. ^ Mitzenmacher, Michael (2009-09-09) (PDF). Some Open Questions Related to Cuckoo Hashing | Proceedings of ESA 2009. http://www.eecs.harvard.edu/~michaelm/postscripts/esa2009.pdf. Retrieved 2010-11-10. 

External links